DOCUMENT RESUME 



UD 013 068 

Cronck, George A. , Jr. 

District Evaluator's Handbook of Selected Evaluation 
Procedures for Categorically Aided Programs Serving 
Disadvantaged Learners. ' 

New York State Education Dept., Albany. Div. of 
Evaluation. 

72 

1 07p. 

MF-$0 .65 HC-$6.58 

Behavioral Objectives; *Compensatory Education 
Programs; Data Analysis; Data Collection; 
Disadvantaged Youth; Educational Accountability; 
Educational Resources; *Evaluation Criteria; 
♦Evaluation Methods; ♦Evaluation Techniques; ^Program 
Evaluation; Remedial Instruction; Sampling; School 
Districts; Statistical Analysis 



Local district personnel are responsible for 
collecting evidence that categorically aided projects have an impact 
upon disadvantaged learners' behavior. The district personnel 
requested assistance in designing evaluation methods to meet their 
needs. In keeping with the State Education Department's policy of 
m a xi m izing service to the field, this handbook was developed by the 
Bureau of Urban and Community Programs Evaluation to assist local 
coordinators assemble defensible data and provide the best 
information for the decision makers who must select treatments for 
their respective disadvantaged learner population. The contents of 
the handbook were assembled in a format that outlines application 
only. It provides selected applications as they seem relevant to the 
construction of behavioral objectives, the development of defensible 
sampling plans, and the analysis of data collected under definable 
evaluation designs. In addition, an appendix provides both actual 
i3 lustrations of evaluation designs currently being applied to Title 
I ESEA projects and an evaluation flow chart for planning. 

(Author/JM) 



ED 069.839 

AUTHOR 

TITLE 

INSTITUTION 

PUB DATE 
NOTE 

EDRS PRICE 
DESCRIPTORS 



ABSTRACT 



ED 069839 



U.$. (DEPARTMENT OF HEALTH. 
EOUCATION & WELFARE 
OFFICE OF EOUCATION 
THIS OOCUMENT HAS BEEN REPRO 
OUCEO EXACTLY AS RECEIVEO FROM 
THE PERSON OR ORGANIZATION ORIG- 
INATING IT POINTS OF VIEW OR OPIN 
IONS STATEO 00 NOT NECESSARILY 
REPRESENT OFFICIAL OFFICE OF EOU 
CATION POSITION OR POLICY 



District Evaluator's Handbook 
of Selected Evaluation Procedures for 
Categorically Aided Programs Serving 
Disadvantaged Learners 



SPRING 1972 



:o 

to 




The University of the State of New York 
THE STATE EDUCATION DEPARTMENT 
Division of Evaluation, Bureau of 
Urban and Community Programs Evaluation 
Albany, New York 12224 




THE UNIVERSITY OF THE STATE OF NEW YORK 



O 

ERIC 



Regents of the University (with years when terms expire) 

1984 Joseph W. McGovern, A. B. , J.D. , L.H.D. , LL.D. , D.C.L. . . 

Chancellor 

1985 Everett J. Penny, B.C.S., D.C.S. ... 

Vice Chancellor 

1978 Alexander J. Allan, Jr., LL.D., Litt. D 

1973 Charles W. Millard, Jr., A.B., LL.D., L.H.D 

.1987 Carl H. Pforzheimer, Jr., A.B., M.B.A. , D.C.S. , H.H.D. . 

1975 Edward M. M. Warburg, B.S., L.H.D 

1977 Joseph T. King, LL.B 

1974 Joseph C. Indelicato, M.D. . 

1976 Mrs. Helen B. Power, A.B., Litt.D., L.H.D., LL.D 

1979 Francis W. McGinley, B.S., J.D., LL.D . 

1980 Max J. Rubin, LL.B., L.H.D 

1986 Kenneth B. Clark, A.B., M.S., Ph. D. , LL.D., L.H.D., D . Sc'. 

1982 Stephen K. Bailey, A.B., B.A. , Ph. D. , LL.D 

1983 Harold E. Newcomb, B.A 

1981 Theodore M. Black, A.B., Litt. D 

President of the University and Commissioner of Education 
Ewald B. Nyquist 

Executive Deputy Commissioner of Education 
Gordon M. Ambach 

Deputy C ommissioner for Elementary. Secondary, and Continuing Edu ca t i oh 
Thomas D. Sheldon ; 

Assistant Commis sione r for Compensatory Education 
Irving Ratchick 

Director, Division of Education for the Disadvantaged 
Louis J. Pasquini 

/ 

Assistant Director, Division of Urban Education 
John L. House 

Assistant Director, Division of Urban Education 
Richard S. Weiner 

Chie f, Bureau of Educa tion Field Servi ces for the Disadvantaged 
William C. Flannigan 

Chief, Bureau of Education Program Services 
Paul M. Hughes 

Associate Commissioner for Research and Evaluation 
Lome H. Woollatt 

Director, Division of Evaluation 
Alan G. Robertson 



New York 

White Plains 

Troy 
Buffalo 
Purchase 
New York 
Queens 
Brooklyn 
Rochester 
Glens Falls 
New York 
Hastings 
on Hudson 
Syracuse 
Owe go 

Sands Point 



Chief, Bureau of Urban and Community Programs Evaluation 
Leo D. Doherty ~ 



2 



FOREWORD 



Local district personnel are responsible foi* collecting evidence 
that categorically aided projects have an impact upon disadvantaged 

I 

learners' behavior. The district personnel requested assistance in designing 
evaluation methods to meet their needs. In keeping with the State Education 
Department's policy of maximizing service to the field, this handbook 
was developed by the Bureau of Urban and Community Programs Evaluation 
to assist local coordinators assemble defensible data and provide the 
best information for the decision makers who must select treatments 
for their respective disadvantaged learner population* 

The Intent 

The handbook was written for district personnel who may be either 
novices or experts in the use of education research techniques. The 
handbook is tailored to projects for disadvantaged learners, is filled 
{ with illustrations created for typical projects, and contains some tech- 

niques that will isolate specific activity effects as reflected by pupil 
achievement. 

The contents of the handbook were assembled in a format that out- 
lines application only. If a coordinator needs to review statistical 
concepts such as the theory of the characteristics of the normal curve, 
random events and probability, and parametric and nonparametric tests, 
he is advised to obtain one of the references included in the bibliography 
of this handbook. The handbook does not develop concepts underlying 
inferential statistics. 

, iii 




3 



o 

ERLC 



The handbook provides selected applications as they seem relevant 
to the construction of behavioral objectives, the development of defensible 
sampling plans, and the analysis of data collected under definable evalu- 
ation designs. In addition, an appendix provides both actual illustrations 
of evaluation designs currently being applied to Title I projects and 
an evaluation flow chart for planning. 

The handbook was designed to be assembled in a loose leaf fashion. 

As a working handbook, it will be changing constantly. As new designs 
develop and become verified as appropriate, they will be sent to the 
local district coordinator as inserts. 

Evaluation Resources 

Local school district personnel, charged with the evaluation of 
compensatory aid programs, occasionally require assistance in order to 
complete the process of evaluation design and data analysis in keeping with 
the sequence of events associated with a project for disadvantaged learners. 
Sometimes such assistance is secured within the district staff from mathe- 
matics teachers, guidance counselors, and school psychologists and other 
staff who have had training in tests, measurements, and statistics. 

External resources which are available for assistance include the 
Title 111, ESEA, Regional Centers; the BOCES Centers; local universities, 
especially those in the State University and City University systems; and 
education research organizations, either university -based or independent. 
Various university departments, particularly of educational or general 
psychology, guidance, or research and evaluation may be of service in the 
design of appropriate evaluation procedures. Graduate students registered 
in such departments have been so employed, but only upon the personal 
recommendation of recognized, competent faculty members. Experienced 

iv 



4 



Y tie I or Urban Education coordinators from other districts are sometimes 
available to provide assistance. 

The Department's Bureau of Urban and Community Programs Evaluation 
can help with the general construction of evaluation designs. 

In the present handbook, appendix D presents an Evaluation Flow Chart 
for Title I, ESEA and Urban Education project planning. Certain of the 
steps indicated (particularly needs assessment) are the definite responsi- 
bility of the Title I or Urban Education Coordinator. In some of the 
succeeding steps, the coordinator should be able to provide raw achievement 
or monitoring data to whomever will be fulfilling the evalution procedures. 

The approach is especially important if the school district decides to 
hire an outside evaluation. Contractors are used most efficiently when 
their efforts are limited to constructing sampling plans, evaluation designs, 
and performing data analysis services, rather than collecting raw data. 

Acknowledgements 

Accountability has tremendous importance in the field of compensatory 
education. Achievement evaluation, a tool of accountability, is being 
continuously reshaped as processes develop to analyze the effects of 
categorically aided programs. This handbook is devoted entirely to 
evaluation procedures tailored to special programs for disadvantaged 
learners. 

The material was compiled and written by George A. Cronk, Jr., 
associate in education research in the Bureau of Urban and Community Programs 
Evaluation. During the editing, Robert F. Miller, William Jaffarian, 

Lee Wolfe, and Donald White contributed valuable suggestions and counsel. 



TABLE OF CONTENTS 

Page 

CHAPTER I: CONSTRUCTION OF BEHAVIORAL OBJECTIVES 1 

CHAPTER II: DEVELOPMENT AND APPLICATION OF A SAMPLING PLAN . . 3 

Defining the Target Population 3 

Using a Table of Random Numbers 6 

A Quick Method for Approximating the Needed Treat- 
ment Group Sample Size When a Nbntreatment Group 

Will Be Used for Comparison 10 

Determining Approximate Sample Sizes Based Upon 

the Desired Degree of Association for Uncorrelated 

Samples 13 

Estimation of a Required Sample Size When Testiug 

Treatment Means 16 

CHAPTER III: METHODOLOGY AND MANAGEMENT PLAN 19 

Developing an Evaluation Design • 19 

Scheduling and Managing Data Collection ...... 22 

Specifying the Instrumentation 23 

CHAPTER IV: APPLICATION OF DATA ANALYSIS TECHNIQUES ...... 25 

Describing Change Through Descriptive Statistics . 25 

Interpretations of Norm Scores 29 

Standard Scores 29 

Stanine Scores 30 

Percentile Ranks 31 

Large Population Statistical Analysis . 32 

' z Ratio Applied to Uncorrelated Stanine Means, 

Posttest Only 32 

Using a Correlated T Ratio on Percentile Scores 

for a Modified Real v. Anticipated Gain Design. 37 

Small Sample Statistical Analysis . 41 

Applying a t Ratio to the Difference Between a 

Pretest and Posttest (Correlated Sample) ... 43 

Actual Posttest Comparison to the Predicted Post- 
test Scheme of Data Analysis Using a t Ratio . 46 

Applying a t test to the Difference Between Two 

Posttests for Independent Samples 51 

Analysis of Variance (ANOVA) 54 

-Interpretation of Decision Making 61 

Analysis of Covariance 63 

Summary for Analysis of Covariance (ANCOVA) .... 70 

The Median Test for Two Correlated Samples .... 71 

The Median Test for Two Independent Samples .... 73 

Wilcoxon Matched-Pairs Signed-Rank Test for Two 

Correlated Samples (N<25) 76 

Chi Square (X^) 79 



vii 



Describing Relationships Through Statistical 

Correlations 

Pearson Product- Moment Correlation Coefficient . 82 

Use of the Point Biserial Correlation 85 

Use of the t Statistic to Account for the Treat- 

ment Impact . ^ gg 

Comments to Coordinators 92 

APPENDIX A - Instructional Activity 94 

APPENDIX B - Support Services 96 

APPENDIX C - Instructional Activity (Summer') 99 

APPENDIX D - Evaluation Flow Chart for itle I Project 

Planning 10 1 

BIBLIOGRAPHY , 



viii 



CHAPTER Is CONSTRUCTION OF BEHAVIORAL OBJECTIVES 

There are at lea 3 t five separate facets of project proposal evaluation 
plans that must be addressed by project proposal writers. The Bureau of 
Urban and Community Programs Evaluation reviews the project's (1) objec- 
tives, (2) sampling plan, (3) design, (4) data analysis techniques, and 
(5) plan of presenting the effects of the special learning treatments 
(activities). If any one of the areas is found v. 'ting, a recommendation 
to disapprove that project is 'automatically sent to the appropriate 
approving office. 

Objectives . An affirmative answer to the following three questions 
is prerequisite to the construction of acceptable proposal objectives.^ 

1. Is the objective stated in behavioral terms for the learner? 

The objective must clearly define what behavioral change 
(growth) will take place as a result of the treatment. 

2. Is the anticipated performance level precisely stated? The 
proposal writer needs to indicate what degree of change con- 
stitutes successful attainment of that objective. 

3. Does the objective contain the criteria that define how the 
reviewer knows that a change has taken place? The means by which 
evidence of the change will be demonstrated must be included in 
the objective. 



*For a comprehensive approach to framing objectives in behavioral terms, 
see Preparing Program Objectives : Proposal Guidelines for Categorically 
Funded Programs, available from The University of the State of New York, 
The State Education Department, Bureau of Urban and Community Programs 
Evaluation, Albany, New York, 12224. 



1 



8 



In some cases, the proposal writer may wish to indicate what 
proportion of the treatment sample will be considered for the successful 
attainment of the objective. 

Below are three illustrations that meet the requisites just listed. 



AREA OF BEHAVIORAL CHANGE 



DEGREE OF CHANGE 



CRITERION REFERENCE 



Illustration A (Traditional 
classroom) 

In reading comprehension . 



Illustration B (Standard 

evaluation) 

In mathematical problem 
solving. 



Illustration C (To be used 
only with criterion 
referenced treat- 
ments) * 

In the mathematical compu - 
tation of addition . 



as measured by the 
Metropolita n 
Achievement Test. 



as measured by the 
Stanford Achieve - 
ment Test. 



the mean of the tar 
get population will 
increase by _l year 



the target popu- 
lation will demon- 
strate achievement 
beyond expectation^ 
(P< .05) 



the target popu- 
lation will demon- 
strate Level ^ 
mastery by the 



addition of one 5 
digit number to 
another 5 digit 
number , such as 
12345 + 67891, 
without regrouping . 



2 

Expectation as used here means an estimate based upon empirical compu- 
tation, usually from district regression analysis for the target population, 
or from prediction based upon individual's regression as described in the 
real gain v. anticipated gain design discussed later. 



3 

At the present time, if a district chose to use Illustration C, the com- 
plete set of mastery objectives for every level would have to be submitted 
with the project application. At some point in the future it is antici- 
pated that the Comprehensive Achievement Monitoring System (CAM) will be 
refined to the point where a reference by index number will be sufficient 
specification. 



O 



2 



CHAPTER II: DEVELOPMENT AND APPLICATION FOR A SAMPLING PLAN 

Defining the Target Population 

Although the target population is specified on the application form, 
additional information is required in the evaluation section of the project 
proposal, 

1. The target population must be defined by the characteristics that 
will be emphasized in the treatment of the educational deficiency. 
The prudent district will define as many characteristics of the 
learners selected for treatment as are feasible. Ultimately, the 
district will attempt to correlate the particular treatment that 
is optimal for learners with particular characteristics. 

2. Frequently, a project for disadvantaged learners contains several 
components. Each component is devoted to different activities 
for different educational deficiencies. The separate sub- 
populations by area of treatment must be specified. Districts 
should also indicate which disadvantaged learners will be in- 
cluded in multiple treatments spanning several components, and 
which learners will receive only one treatment for one particular 
educational deficiency. The district is then in a position to 
determine whether a single effort produces the desired results, 
or, whether there is a multiplicative effect due to a concerted 
effort to coordinate several component treatments. 

3. When large numbers of pupils (more than 120) are included in a 
component of a project, analyzing the entire target population for 
growth as measured by a test is not necessary. Usually, a sample 
will exhibit the changes taking place in the entire treatment 



population for that particular treatment. Error in an individual's 
deviation about a mean will be counterbalanced and the sample will 
apjp. oximate the distribution of a much larger population, on a 
particular characteristic. For a treatment group of less than 
120 pupils for any individual treatment within a component, the 
entire treatment group should be included during the data analysis 
phase of the project. 

4. When a sampling approach is being used by an evaluator, it is 

critical that the method of sampling be described. When sampling 
is not done by one of the procedures mentioned below, defensible 
inferences about the population cannot be made. Verification as 
to the effectiveness of a treatment is not possible under such 
circumstances . 

Randomized Sampling: The evaluator simply selects students from the 

treatment population in an aimless or haphazard fashion until he fills the 
size of the sample sought. The most common course taken in random sampling 
is to take a table of random numbers and select numbered participants 
according to the table's "sequence." 

Stratified Random Sampling : The evaluator introduces a variable or 

characteristic to the population and then selects the participants randomly 
within the .whpopulation of that characteristic. For example, consider a 
New York City reading project where the total treatment group consisted of 
1,500 Puerto Rican students and 4,500 Afro-American students. The evaluator 
desired to obtain a sample of 120 participants. Using the ethnic background 
as the stratification factor, he would select 30 Puerto Rican students at 
random and 90 Afro-American students at, random . The evaluator obtained a 
proportional stratified random sample. 



ERIC 




4 



Multistage Sampling : The evaluator randomly selects a unit of the 

population and then samples again within that unit. For example, an 
evaluator may randomly select several schools from all the schools in 
his district conducting reading projects with paraprofessionals , and then 
randomly select grades within each school. 

Cluster Sampling ; The evaluator purposely clusters schools around 
one or more factors and then samples within the cluster. Once the clusters 
are defined the evaluator is free to select randomly or by stratification. 
For example, an evaluator may want to sample second grade Title I remedial 
reading students, but by a cost per pupil and size of class situation. He 
would make a grid or "cell" plan with a cost per pupil axis and a class 
size axis. After assigning all second grade Title I remedial reading 
classes to the appropriate "cell" he is free to simply randomly select 
pupils or to stratify (e.g., by sex, past performance, ethnic origin) his 
selections if he so chooses. 

While sampling plans can be designed to be extremely sophisticated, 
the basic rule in sampling is this: Keep the sample as free from bias as 

is possible . 



5 




Using a Table of Random Numbers 



When an evaluator needs to compare treatment groups to nontreatment 
(but eligible) groups, the evaluator should select the samples at random. 
Some evaluators use a roulette wheel, a loctery like the Armed Services 
draft, or even a basin filled with well-mixed numbered balls or papers. 
Some evaluators are fortunate enough to be able to assign pupils at 
random to the treatment group or the regular classroom group at the outset. 
Other evaluators (with the coordinator) are limited to assigning a treat- 
ment to a classroom at random. In each case the main principle that is 
followed is to select pupils within either the treatment or nontreatment 
group without regard to an order or system. In other words, every pupil 
within a treatment group or nontreatment group would have the same chance 
as every other pupil of being picked to represent the sample. (Actually, 
as pupils are picked the population shrinks slightly, so that the re- 
maining pupils stand a slightly increased chance Qjetter odds] of being 
picked) . 

When selecting random samples, many evaluators use a table of random 
numbers. Tables of random numbers are usually constructed by computers. 
Every digit that appears in every row or column from 0 to 9 had an equal 
chance (with every other digit from 0 to 9) to appear in that spot. 

The evaluator can read the numbers consecutively in any direction in 
the table; that is, horizontally by rows, vertically by columns, or 
diagonally up or down. The numbers read represent the pupils to be 
selected for consideration for (1) assignment to a treatment or a regular 
classroom (2) selection as test score recipients within a treatment or a 
regular classroom. 



Below is a section of a table of random numbers: 



MOCK TABLE OF RANDOM NUMBERS 
COLUMN 







12345 


6-10 


11-15 




01 


69122 


95199 


26699 




02 


39418 


20224 


99094 


R 


03 


30033 


73090 


29531 




04 


94068 


03488 


62386 


0 


05 


06088 


39952 


26216 




06 


60935 


83696 


06316 


W 


07 


10704 


48969 


59596 




08 


27427 


44103 # 


87646 




09 


56401 


37655* 


10515 




10 


95603 


39622 


79952 



If the target population has less than 100 total pupils eligible for 
selection, a two digit number is required. (1) Looking at row 01 and columns 1 
and 2, the first pupil picked would be pupil #69. (2) Moving horizontally, 
the second pupil would be pupil #12; the third, pupil #29. If a pupil is 
picked twice (i.e., row 08, columns 4 and 5 and row 10 columns 10 and 11) 
simply skip the second entry and move on until the sample is filled. 

(3) Moving vertically, the second pupil would be pupil #39; the third 
pupil #30, etc. (4) Moving and dropping diagonally, the second pupil 
would be pupil #41; the third, #37; the fourth, #34: etc. 

If the target population is larger than 100 but less than 1,000 
(000 to 999), then three digit numbers are required. If the first pupil 
was again selected at the starting point of row 01, then columns 1, 2, and 
3 are required. Moving horizontally the first pupil is #691; the second 

pupil is #229; pupil number three is #519. 

To find a starting point in the random numbers table a common 

practice is to take a pencil with the eraser end pointing toward the table, 
look away from the table, and quickly thrust the eraser onto the table. 

That number covered by the eraser is the starting point in the table. 



7 



Another common practice is to roll a pair of dice. Let the digit on one 
die represent the starting row, and the digit on the other die the starting 
column. The idea behind the blind thrust or the die throwing approach is 
to avoid superimposing a "system" of always obtaining the same sequence of 
r< numbers. 

Illustration: A Title I coordinator was faced with the problem of 

selecting 20 pupils for a remedial reading treatment from a total popu- 
lation of 70 eligible target pupils. The pupils were on an alphabetical 
listing. Almost all of the pupils of the parents of the 70 pupils wanted 
their children to receive, the treatment. The Title I coordinator decided 
to select the pupils randomly from a table of random numbers. He needed 
two digit numbers. He would have to disregard any two digit numbers that 
were larger than 70. Using the blind thrust techniques for starting in 
the sample table above, the coordinator started at row 04, column 4 and 
moved horizontally. The pupils were as follows: 

68,03,48 , X, 23 ,K, 06, 08, ,^,52, 

26, 21, 66, 09, 35, 69, 60 ,X. 16, 10, 

70,44,**,M,59,X62 

Numbers 86, 83, 99, 73, and 89 are crossed out because they were 
greater than 70. The second entries for numbers 69 and 59 were crossed 
out because they had already been removed from the sample. 

The coordinator was able to assign the 20 pupils to the treatment 
class and withstand any charge of favoritism or bias on the grounds of 
sex, race, creed, etc., from the parents of eligible pupils not receiving 
the Title I remedial reading treatment.^ 

4 

Since there is an alphabetical bias, this method is far superior to the 
common practice of systematically selecting every fifth name on an 
alphabetical class list. 



8 



Illustration: Consider the illustration given above, but assume that a 

larger target population existed that was composed of several hundred 
pupils. The 20 pupils receiving treatment were again assigned by use of 
random numbers from a table. Now, however, a nontreatment pupil scores 
set up numbering about 20 is required in the spring to compare the treatment 
group with the nontreatment group for achievement in reading comprehension. 
The coordinator needs a sample of nontreatment pupil scores since he does 
not want to use several hundred scores. Option #1: An alphabetical listing 
of nontreatment pupils is prepared and the same procedure is repeated to 
yield a random selection of nontreatment pupils for comparison purposes. 
Option #2: The original list from which the 20 treatment pupils were 

selected is resurrected, and by continuing on in the table a second set of 
20 nontreatment pupils is isolated. (In actual practice, option 2 is 
usually selected and fulfilled at the same time the random assignment to 
the treatment group is undertaken.) 



9 



16 



A. Quick Method for A pproximating the Needed Treatment Gro up 
Sample Size When A No ntreatment Group Will Be Used for Comparison 



The method described below is used to estimate sample size prior to 
pupil classroom assignment and data collection. The modified McGuigan 5 
approach attempts to answer the question "How many disadvantaged learners 
m the Title I treatment group and how many disadvantaged learners in the 
nontreatment group should be tested to be sure to demonstrate significant 
differences if they exist?" The following steps are suggested: 

Step - 1. Check last year's Title I treatment group mean and a mean from an 
equal number of randomly selected eligible nonparticipants. Subtract the 
nontreatment group's mean (X2) from the treatment group's mean (X^) . 

Step - 2. Calculate the variance for each group separately. 

a. If the variances are almost identical, use the following 

estimation formula; „ o o 

n = 2t z S z 

(x 2 - x L ) 2 

where n = the number of scores in the treatment group 

S 2 = the unbiased estimate of the population (common) • 
variance 

t = the t ratio for independent means 

b. If the variances are considerably different, use the 
following formula; 

n = t 2 (S 1 2 + S 2 2 ) 

Tx 2 - xp 2 

2 

where S l = the variance of the treatment group 

S 2 - the variance of the eligible, nontreatment group 



Frank J. 
2nd ed. , 



McGuigan, Experimental Psychology; A Methodological Aonroarh. 
Englewood Cliffs: Prentice -Hall, 1968, p . 354. 



Step - 3. Set the probability level for the desired level of significant 
difference (i.e., p^.OS). On that basis estimate the value of critical 
t (for p 4.05, be sure to estimate t >1.96; for p^.Ol, set t >2.58). 

Step - 4. Compute the sample size from the formula selected in step 2 
above. Illustration: A Title I evaluator was conducting a special 

computer oriented remedial mathematics treatment for disadvantaged 
learners. Limited funds meant that only 200 pupils out of a target popu- 
lation of 500 pupils were going to be able to receive the special computer 
oriented treatment. Since parents were extremely sensitive as to whose 
youngsters would be selected, the pupils were assigned to the treatment or 
the regular classroom randomly. During the previous year, while there 
were several implementation problems, the treatment group appeared to have 
surpassed the regular classroom group on the spring standardized test in 
mathematics. However, the previous year treatment group was composed of 
only 50 disadvantaged learners. The question before the evaluator was 
"How many pupils are needed in the treatment groups and nontreatment 
group to demonstrate a significant difference if there is any?". 

First, the evaluator randomly selected a group of 50 eligible 
nontreatment pupils from the previous years. He arrayed the data as 
follows : 



Treatment group Nontreatment group 



50 


50 


124 


120 


110 


125 



The evaluator did not know if the two variances (110 and 125) were 
equivalent (homogeneous). So, he decided to elect formula 2b from 



above. ° Ke arbitrarily selected a t value of 2.1 (p <.05) 

t 2 (S. 2 + S 2 ) (2. 1) 2 (110 + 125) 

N " _ = -v = 65 

( X 1 - X 2 )2 (124 _ 120 j 2 

In the spring after the districtwide testing, the evaluator will randomly 
select 70 treatment pupils and 70 nontreatment pupils for mean score 
comparisons with the t test. The evaluator decided on 70 pupils in each 
group since the obtained 65 was the very minimum he needed. 



S)he evaluator could have checked for the homogeneity of the variances 
by creating an F ratio. That is F = larger var iance = s^ 2 

smaller variance . He 



S, 2 

then would have checked the F table to see if the value there was 
exceeded by the value resulting from his ratio. 





ft 



Determining Approximate Sample Sizes Bas ed Upon the 
Desired Degree of Association For Uncorre lated Samples 

Hays^ has described a method for approximating the size of the un- 
correlated samples needed, when an evaluator wishes to be sure to have 
enough subjects to make sure significant differences at selecte d leve ls of, 
association 8 will show up. The question of determining how many subjects 
to test in the Title I treatment group and how many eligible, but non- 
treatment (regular classroom) pupils to include can be answered by this 

method. This method yields a minimum number. 

1. The evaluator must decide the level of significance that will 
satisfy his need to reject the null hypothesis (no difference between the 
treatment and nontreatment group means) . 

2 

2. The evaluator must decide what degree of association u (omega 
squared) between the treatments and the variance in the obtained scores 

he desires. 

3. The evaluator must solve the following equation for delta: 




a) = the degree of association. 

4. The evaluator must solve the following equation for the sample 
size of the treatment group and then select an equal number for the non- 
treatment group. 

n = 2(2.58(p=.01) + 2.33) 2 n = 2(1.96(p=.05) + 2.33) 2 

- w OK I, 

A 2 F 

^William L. Hays, Statistics for Psychologists , New York: Holt, Rinehart, 
and Winston, 1963, p. 32/. 

Sgome times significant differences will show up between measurements taken 
from some populations, but the level (strength) o£ the association may be 
very slight (trivial). The evaluator is interested not only in knowing 
whether a significant difference existed between groups (and, hence treat- 
ment effects), but also how much of that difference can be associated with 
given treatments . 

13 



O 



20 



Illustration: An evaluator was going to posttest the difference between a 

special Title I treatment second grade classroom and a regular second grade 
classroom in remedial reading. The evaluator desired to assign the pupils 
at random to the treatment and regular classroom. The evaluator needed to 
know how many students to test to see if the treatment had the effect that 
was being claimed by the publisher of the materials for the special 
treatment. 

Step 1. The p <.01 level was selected as the significant difference 

level. 

Step 2. The evaluator desired at least a .30 association between the 
treatment and the variance in the two groups' achievement scores. (If 
w 2 = .30), then the evaluator is inferring that the treatment accounts for 
approximately 30 percent of the variance in the obtained scores. 

Step 3. A = 2 y.30 = 1.30 . 

/l-.30 

Step 4. n= 2 (2.58 + 2.33)2 = 28>5 

(1.30)2 

The evaluator needs at least 29 pupils in the Title I treatment group 
and another 29 pupils in the regular classroom. The evaluator should select 
a few more 9 pupils in each category than the approximate estimate to be 
assured of reaching his association if there, indeed, is one of .30. 
Illustration: Consider the same illustration as above, but, let the 



A few extra pupils in the samples are advisable since schools receiving 
categorical aid are noted for attrition in the target population 
m any year. In rural upstate New York, the exit rate of pupils is 

estimated at 8 percent while in urban areas the estimate is close to 
percent. 




21 



evaluator decide that he can't obtain 29 pupils in each category because the 

funds are simply not sufficient to implement the treatment for 29 pupils. 

The evaluator decides to use his level of significance as .05 instead of 

.01 as in the previous example, but to retain the association of .30 between 

2 

treatment and scores. Reapplying Step 4 he has n = 2(1.96 + 2.33) = 21.78 

(1.3) 2 

Or, for a total target population (treatment plus nontreatment) he needs at 
least 44 pupils. 

The brief table below indicates the minimum number (n) of pupils 
needed in each treatment group by level of significance (p 05 ; p £.01) 
for the level (strength) of association (u> 2 ) desired without adjusting for 
attrition. 



P £«05 


p 4-01 


u.2 


n 


0)2 


n 


.10 


85 


.10 


in 


.15 


48 


.15 


81 


.20 


37 


.20 


49 


.25 1 


28 


.25 


37 | 


.30 


22 


.30 


29 


.35 


17 


.35 


23 


.40 


14 


.40 


19 



In summary, two operations are important when drawing samples and 
making inferences about categorical aid treatments: (1) the sample must be 
composed of sufficient numbers to illuminate significant differences when 
true differences do exist, and (2) given a true difference and correspond- 
ing t value, the strength of an association is required for stating that a 
treatment is important in affecting pupil behavior. 



15 



22 



Estimation of a Required Sample Size 
When Testing Treatment Means 



The method described below can be found in greater detail in chapter 
12 Sampling an d Statistics Handbook for Surveys in Education , prepared and 
published by the Research Division of the National Education Association, 
1965. For the formula to be applied effectively several items of population 
or sample data are required: 

(a) the approximate size of the treatment group (n) 

(b) the approximate standard deviation of the group or a previous 
years sample (SD) on the variable under study 

(c) the approximate error of the mean of the group or of a previous 

years sample (SE __) on the variable under study 
X 

In addition, the evaluator needs to select a level of confidence (probabil- 
ity) that will be required at the time of the statistical test. The 
appropriate deviation value (?) that corresponds to this level is simul- 
taneously determined (ie., for p^.Ol, Z = 2.58). 

n = SD 2 

— — — - — — — — , where n is the estimated sample 

(SE_J SD 2 

X + size needed. 



Illustration: A school district planned to provide 400 (N) disadvantaged 

fourth grade pupils with ESEA I funded remedial reading treatments. The 
coordinator wanted to know how many pupils to submit to a pre and post 
administration of the Metropolitan Achievement Test reading subsections 
(for a correlated z ratio analysis). From an analysis completed the 
previous year, a similar fourth grade sample (100) of disadvantaged learners 

16 



23 



had attained a pretest mean of 2.2 (grade equivalent) with a standard 
deviation (SD) of .4; and standard error of the mean (SE.__) of .04. The 
coordinator estimated the random sample size by the formula given above for 
the proposed statistical test to be interpreted at the .05 level ->f con- 
fidence. 

(• 4) 2 _ 

n = ■ — — 134 

(.04) 2 + ( .4) 2 

1.96 400 

In other words, a sample of 134 randomly selected pupils would represent 
the district's target population composed of the 400 disadvantaged fourth 
grade learners. Remember, however, that 134 is > the minimum number required 
and does not allow for pupil mobility in a school year. 

(Note: Evaluation contractors repeat this estimation procedure for each 

grade that will be included in an analysis so that inferences in their 
reports can te made with a stated degree of accuracy and confidence. 
Coordinators must be prepared to provide the preliminary data so that 
reasonably close sample sizes can be estimated.) 



17 



CHAPTER III: METHODOLOGY AND MANAGEMENT PLAN 



Developing an Evaluation Design 

A plan of evaluation should be developed before a project is 
implemented. The purpose of designing an evaluation plan is mainly to be 
sure that changes (growth) in the learner's behavior can be measured. 
Measurable behavioral changes provide the educational feedback upon which 
the improvement of the teaching-learning process depends. Well defined 
evaluation plans solidify the data collection procedures that finally net 
data upon which to base defensible decisions. Below are several general 
evaluation designs appropriate to projects for disadvantaged learners. 

1. Classic Experimental v. Control. This design is used when two 
equivalent groups of pupils are going to be compared for a change in 

behavior. The experimental group receives the treatment while the control 

, _ 10 
group does not. 

Example . A special mathematics computation treatment is 
to be provided for 30 fourth grade disadvantaged learners 
who are measured as 2 years below grade level on the New 
York State PEP Tests. The treatment will be 1 hour per day, 

5 days per week for 15 weeks after school in the Title I 
math lab at the Horace Mann School. A control group, 
located in the same school with each control student paired 

10 Note: In Title I Projects, this design is only permissible when a) 
funds are so limited that the treatment will not reach all eligible 
disadvantaged learners or b) when the experimental group receives the 
special treatment the first half of the year and the control group 
receives the same treatment during the second half of the year. 



19 



25 



with an experimental group student on at least three 
characteristics will be used to make the comparison. Both 
groups would be tested on one form of a standardized test 
before the treatment and then again on another form* ** of the 
standardized test after the treatment. The results are then 
compared (see the section on data analysis). 

2. Real Gain v. Anticipated Gain (Others). The real v. anticipated 
gain design is used when a staff can predict the probable number of months 
of achievement for a disadvantaged target population without a specialized 
treatment. The target population is tested before the treatment and after 
the treatment and the difference is compared to the anticipated gain. 
Example . Based on their past experience (which was 
consistent with the Coleman Report), the staff at the 
John Dewey Elementary School knew, based on last year's 
class that Miss Lernen's and Mr. Klug's third garde classes 
would show a reading comprehensive achievement growth of 
5 months on the Metropolitan Achievement Test in June. 

However, this year categorical aid for the disadvantaged 
was going to support each classroom with special remedial 
and developmental reading materials and an education 
assistant.. The target pupils were tested before the treat- 
ment and again after the treatment on alternate forms of 
the Metropolitan Achievement Test. The real gains were 
compared to the anticipated gains for both classes. 

* **If a sufficient amount of time has elapsed, and, the nature of the 

test is such that pupil recall of previous responses is negligible, 
the same form of the test may be readministered as a posttest. 







20 



'26 



3. Real Gain v. Anticipated Gain (Self). This evaluation design is 
similar to the preceding design, but depends upon a different prediction for 
the anticipated gain. The disadvantaged learners in target population have 
"averaged" an achievement increment gain to date. The anticipated gain is 
based upon that increment. 

Example . Mrs. Wissen's class of third grade reading pupils 
was tested at the beginning of the year. The mean of the 
scores in vocabulary was converted to show an average 
monthly gain of .5 months for every month spent in class. 

Mrs . Wissen anticipated a mean growth of 5 months for a full 
school year's experience. However, a categorically funded 
project supplied word attack skill materials, phonics kits, 
specially tailored enrichment field trip experiences, and 
an aide. Mrs. Wissen tested the students at the conclusion 
of the school year and compared the actual monthly average 
gain to their anticipated monthly gain. 

4. Real Gain v. Normalized Gain. This design is appropriate when 
the evaluator has available a local district norm. State norm, or national 
norm already established. The target population is tested before and after 
the treatment as in the previous designs. The difference between the means 
obtained on the two testings is then compared to the already established 
norms. 

Example . The Martin Luther King, Jr. School is planning 
to add a reading laboratory with special materials, remedial 
reading specialists, and educational assistants. Three 
hundred disadvantaged learners who scored below two grade 
"equivalent" levels (below the 23rd percentile on the NYS 
PEP Test) on reading comprehension are to receive individual- 
ized instruction in reading comprehension 1 hour per day, 

21 



three times per week for 25 weeks. Recently, the Iowa 
Test of Basic Skills normed two new forms of its tests 
as part of the nationwide norming process in the district 
to which Martin Luther King, Jr. School belongs. In 
other words, a district norm exists. 

4 * 

All target pupils were tested on one form of the 
Iowa Test of Basic Skills before the treatment and then 
on an alternate form of the test after the treatment. 

A simple random sample of 120 target pupils was drawn 
and the difference between the means obtained by the two 
testings was compared to the district norm (see the 
chapter on Data Analysis). 

Other designs for evaluating student growth or local variations of 
the designs above may be applied to a compensatory aid project. Funda- 
mentally, the reasoning behind requiring an evaluation design is to 
quantify growth exhibited by the learner. With objective data obtained 
through an evaluation design (a) the learner can receive reinforcene nt 
(motivation) , (b) a particular treatment can be revised according to 
empirical findings, and (c) defensible decisions regarding the greatest 
education "yield" based upon cost and achievement can be implemented with 
the allocation of future categorical aid. 

Scheduling and Managing Data Collection 
In addition to the evaluation plan, the schedule of data collection 
should be specified. The simplest and most widely accepted data collection 
procedure at the present time is to plan to collect data before the treat- 
ment (baseline data) and again after the treatment. When observers are 
going to collect data during an onsite visit or when questionnaires are 

22 



28 



going to be released, the time of the year and the time during the project's 
"life" should be specified. If multiple observations are involved, each 
observation date should be indicated. Furthermore, each site for each 
observation should be specified in the project proposal when several schools 
are included in the same project. 

Specifying the Instrumentation 

Included in any evaluation design must be some performance to indicate 
that the behavioral change is exhibited. The most widely accepted means 
used, presently, is the standardized test. Every project proposal should 
specify the standardized test that is going to be used for the data 
collection. 

When locally developed instruments are to be used, the instrument or 
a description of the instrument should be included in the project proposal. 

( Locally developed instruments should be constructed according to accepted 

12 

procedures for obtaining reliable and valid tests. 

Rating scales, observer or pupil checklists, questionnaires, and 
interview schedules should be constructed so that the responses recorded can 
be quantified, preferrably on an equal interval continuum. This practice 
becomes critical when correlations between student achievement and selected 
classroom practices or stimuli are desired. Again, copies or descriptions 
of the rating scales, checklists, questionnaires, and interview schedules 
should be attached to the project proposal. 



ror a succinct practical manual devoted to developing objective tests 
of achievement that would be appropriate for specialized treatments, see: 
Gronlund, Norman E. Constructing Achievement Tests . Englewood Cliffs, N. J. : 
Prentice Hall, Inc., pp. IX + 118. 



O 



23 



CHAPTER IV: APPLICATION OF DATA ANALYSIS TECHNIQUES 

{ The data collected as a result of the project evaluation design will 

have to be analyzed. The techniques that will be employed in the analysis 
must be specified in the project proposal. Both descriptive and inferential 
statistical techniques should be included in the data analysis plan. 

Describing Change Through Descriptive Statistics 

Such statistics include the mean, median, mode, range, variance, 
standard deviation, and standard scores. Descriptive statistics are 
frequently used in compensatory aid projects to indicate where a sample of 
disadvantaged pupils receiving a special treatment (e.g., remedial reading) 
would be located relative to all disadvantaged pupils deficient in that 
educational area. 

Definitions: 

^ The mean (X) is the arithmetic average of the scores obtained by a 

. measurement. The mean is obtained by adding each pupil's score (X^) 

to form a population total (EXj) and then dividing by the number of 
scores (n = pupils). 

The median is the point in any distribution of scores where one-half 
of the scores lie above that point and the other half lie below. 

A quick approximation to the median can usually be obtained by putting 
all the scores in consecutive numerical ascending order and counting 
from the highest score downward until one-half the population is 
reached. 

The range is one plus the difference between the two most extreme 
scores in the distribution of scores. 

;( 



25 



The mode is the score that was received most frequently by the target 
population. 



The devia t ion is the distance on a distribution of scores that 
indicates how far from the mean a particular score is located. 

The population s tandard deviation is the square root of the sum of 
every score's deviation from the mean, squared, and divided by the 
number in the population (n) . SD = f^ (X. -X) 2 

Ji=i — ^ 

The v ariance is the standard deviation squared. If the distribution 
is normal (bell shaped) then approximately 68 percent of the total 
population should fall within one standard deviation of the mean. 
Ninty-five percent of the total population should fall within cwo 
standard deviations of the mean. Ninty-nine percent of the population 
will fall within three standard deviations of the mean. 

A standard score (z^ is a pupil's deviation divided by the standard 

deviation ^ = X t - X } 

SD 

The mean and standard deviation are the two important parameters 
(measures) for assessing the central tendency of a distribution. 
Frequently, disadvantaged learners' scores lie more than one standard 
deviation below the mean on a standardized test normed (without re- 
gard to disadvantagement) for a particular grade level. Some districts 
use this as one criterion for selecting disadvantaged students for a 
particular treatment funded by categorical aid. Other districts, 
using locally developed instruments, apply descriptive statistics to 
establish baseline data for future reference after a treatment has 
been conducted. 




31 



26 



The following example illustrates how to obtain each of the 
descriptive statistics just defined. 

EXAMPLE 

On a locally developed word recognition test the following raw scores 



were obtained 


from the target 


population of nine 


remedial reading pupils 


(N = 9) : 10, 


15, 2, 13, 7, 6, 


10, 17, 10 








Raw 




Squared 




Pupil 


Score 


Deviation 


Deviation 


z Score 




(X.) 


(Xi-X) 


“ (Xi-X ) 2 




X 1 


17 


+7 


49 


+1.61 


X 2 


15 


+5 


25 


+1.15 


X 3 


13 


+3 


9 


+ .69 


X 4 


10 


0 


0 


0 


median — 


> 10 


0 


0 


0 


X 6 


10 


0 


0 


0 


X 7 


7 


-3 


0 


- .69 


X 8 


6 


-4 


16 


- .92 


X 9 


2 


-8 


64 


-1.84 


E X 


= 90 


E (X i -X) = 0 


E (X^X) 2 = 172 





The range was (highest score - lowest score + 1) 16 points. 

The mode (most frequently received score) was 10. 

The "approximate" median score (midpoint score) was 10. 

The mean (arithmetic score) was 10. 

E&ch deviation (X^X) was found by subtracting the mean from the raw score. 



27 



Each deviation was squared in the process of finding the standard deviation. 13 



The standard score (z score) for each pupil was obtained by dividing each 

pupil's deviation by the standard deviation. 

Theoretically, 68 percent of the pupils should fall within the area 

of the mean + 4.35. This would include students X~, X. , X , X,, X and X„ 

J 4 5 6 7* 8 

who all fall in the area from 14.35 down to 5.65 (six out of nine students = 
67 percent). The mean (X = 10) plus or minus two standard deviations 
(+ 2 £4.35]) does include all raw scores. 

Another useful statistic is the standard error of the mean . This 
statistic is used for inferential statistical tests. Basically, the 
standard error of the mean is an estimate of how far the sample mean is 
from the true mean if the universe of the target population were tested. 




N 



SE 



= SD 



X X 




In this example, the standard error of the mean 



SD_ 

X 4.35 4.35 _ 1.53 



^/TTT- /S /^T 2.83 



13, 



Throughout this chapter, the standard deviation will be "biased." A 
correction for bias will be introduced, when, computing, the standard 
error of the mean. 



28 



Interpretations of Norm Scores 



Below is an illustration of scores most widely reported by 
standardized achievement tests. The illustration is based upon the dis- 
tribution of pupil's scores as they relate to the entire population upon 
which the test was standardized. 




Standard Scores 

The z score is defined as a standard score. This score for an 
individual pupil is derived by subtracting the population mean score from 
the pupil's score and dividing this by the population standard deviation. 

X. - X x 

Z = ts 

SD SD 

Sixty-eight percent of the normal distribution of scores will lie 
between a z score of +1.00 to -1.00. ESEA Title I is largely concerned 
with assisting disadvantaged learners who obtain scores below a = -1.00. 

t scores can also be used for interpretations on teacher made tests. 
Illustration: The following sample of scores was obtained from an Afro- 

American History Test given in five fourth grades. 



29 



Teacher made 
P°pil | test score 



X - X 



z 



A 


3 


-3 


-3 = - .875 


B 


2 


-4 


3.41 -1.168 


C 


7 


+1 


+ .292 


D 


9 


+3 


+ .875 


E 


11 


+5 


+1.460 


F 


4 


-2 


- .584 


G 


1 


-5 


-1.460 


H 


6 


0 


0.000 


I 


7 


+1 


+ .292 


J 


10 


+4 


+1.168 


= 10) 


E = Sum = 60 


o 

II 

w 


o 

It 




X = Mean = 6 








SD = J E(X-X) 2 


- Jm. 


= 3.41 (rc 




N^l 


9 






Stanine 


Scores 





Stanine is derived from the contraction of the words standard nine. 



Standard nine means the normal distribution was divided into nine parts. 
The mean for the distribution is the midpoint of stanine 5. With the 
exception of stanines 1 and 9, each band of scores within a stanine is 



roughly one-half of a standard deviation in width. Below is a chart de- 
picting the percentage of pupils within each stanine and the cumulative 
number below each stanine. 



S tanine 


1 


2 


3 


4 


5 


6 


7 


8 


9 


Within (%) 


4 


7 


12 


17 


20 


17 


12 


7 


4 


Below (%) 


0 


4 


11 


23 


40 


60 


77 


89 


96 



For example, the New York State Pupil Evaluation Program defines being below 
minimum competence in reading as being below the 4th stanine. This 
definition encompasses the lowest 23 percent of the normal distribution tail 
on the left side of the bell shaped curve. 





30 



Percentile Ranks 



Frequently, standardized tests have a table where raw scores can be 
converted into percentile points. Percentile points are a value. However, 
pupils are usually referred to as having fallen at a specific percentile 
rank, rather than having attained a percentile point. For example, if 78 
percent of the norming population attained less than the score value of 20 
on a particular measurement device, then the value 20 is the 78th percentile 
point. A pupil who receives a score of 20 would simultaneously have attained 
the 78th percentile rank. 

One criterion for determining disadvantaged learners in New York 
State is to survey those pupils who attained a score on the NYS Pupil 
Evaluation Program Reading Test of the 23rd percentile rank or below. 



Large Population Statistical Analysis 



When scores from sample populations in excess of 120 are available, 
one of the easiest methods of statistical analysis is to apply a z ratio 
to the differences between two sets of scores. For a z ratio to be 
significant at the .05 level (p £.05), a value of + 1.96 or greater in 
magnitude is required. For significance at the .01 level (p 1.01), a 
' z value of + 2.58 is required. 

z Ratio Applied to Uncorrelated Stanine Means, Posttest Only 
The z ratio is defined as 




where X^, X^ are different samples means. (Uncorrelated refers to 
scores or means from two different samples composed of two different sets 
of individuals. Another formula is used for a pretest-posttest analysis 
for two sets of scores yielding a pretest mean and/or posttest mean for 
the same individuals.) 



Illustration : 

Two elementary disadvantaged learner schools containing two fourth 
grade classes were eligible for Title I funded reading activities (the 
pupils had scored at the 23rd percentile rank or below on the NYS PEP Test 
the previous year). However, the ESEA Program Office directive (stating 
that the supplementary expenditure must equal or exceed $350 per child) in 
reality meant that only one of the classes would get a Title I funded 



is defined as the standard error of the difference between 




O 



37 



32 



remedial reading treatment. The district evaluator planned to randomly 
sample within the two schools and administer a pretest in early October 
and a posttest in late June with the same standardized achievement test. 

The evaluator planned to compare the rates of growth between the school 
receiving Title I funded treatments and the school not receiving treatments 
as well as the stanine positions cf the two eligible populations at the 
end of school year. 

Unfortunately for the district evaluator, a lengthy teacher "job 
action" (which was resolved) and a series of bomb scares forced the 
evaluator to abolish the pretest-posttest evaluation design. The only 
scores the evaluator was able to obtain were the Stanford Achievement Test 
reading scores derived from the districtwide June testing program. 

One hundred twenty-two (Nj=122) fourth grade pupils from the target 
classrooms receiving Title I treatment were distributed in the bottom 4 
stanines. One hundred forty-four (N 2 =144) fourth grade eligible pupils who 
did not receive Title I treatment were also distributed in the lower 4 
stanines. The district evaluator decided to use a ~z ratio to determine 
whether a significant difference (p - .05) existed between the two groups. 
If a significant difference did exist and favored the treatment group, the 
evaluator could then infer that Title I funds do assist in bringing about 
(1) increased achievement and (2) achievement beyond what would have 
occurred in the regular (nontreatment) classroom. 

Below is the way the district evaluator analyzed the data. 



Treatment Group 




Nontreatment Group 


Stanine Number 


of Pupils 


Stanine 


Number of Pupils 


1 


17 


1 


35 


2 


30 


2 


37 


3 


35 


3 


37 


4 


40 


4 


35 



33 



Step 1: He summed the scores by treatment and nontreatment. 

Treatment Sum ( E ) = 1x17+2x3. 1-3x35+4x40 = 342 

Nontreatment Sum ( E ) = 360 

360 

Step 2: He found the two means. Treatment = 122 = 2.8. Nontreatment 

Mea " *2 * m ‘ 2 - 5 - 



Step 3: He found each mean's standard deviation by (a) taking the 

deviation of each stanine from the mean, (b) squaring the 
deviation, (c) multiplying the squared deviation by the number 
of pupils within that stanine, (d) summing the squared by 
stanine, (e) and applying the formula for the standard 
deviation discussed under the descriptive statistics section 
above . 

Treatment 



n j-H^+n^+n, 



Pupils 


Stanine 


Deviation 

(X^-stanine 

value) 


Deviation 
Squared n^ 


x(deviation)^ 


n 1 = 17 


1 


-1.8 


3.24 


55.08 


n 2 = 30 


2 


- .8 


. 64 


19.20 


n 3 = 35 


3 


+ .2 


.04 


1.40 


n 4 = 40 
= N 1 =122 


4 


+1.2. 


1.44 

E 


57.60 

=133.28 




SD_ 

X 1 


= /l33. 28 

V 122 


Vi". 09 = 1.04 




Pupils 


Stanine 


Nontreatment 

2 

Deviation (Deviation) n x(Deviation) 

(Xg -stanine) 1 


n 1 = 35 


1 


-1.5 


2.25 


78.75 


n 2 = 37 


2 


- .5 


.25 


9.25 


n 3 = 37 


3 


+ .5 


.25 


9.25 


"4- 35 
H. - 144 


4 


+1.5 


2.25 

E 


78.75 
= 176.00 



SD_ » / 1 7 6 . 00 ml 1.22 = 1.10 

X V 144 
2 



34 



Step 4: He calculated the standard error for each mean. 



SE = SE = 1.04 = .0945 



X, X 



1 a/12T 



SE_ = SE_ = 1.10 = .0921 

X 2 X 2 VUT 






- 1 



i/v 77 



Step 5: He found the standard error of the difference between the two 

uncorrelated means. 



SE = /fsE__) 2 + (SEJ 2 

W X 1 X 2 




(. 0945) 2 + ( .0921) 2 = .132 



Step 6: He applied the z ratio. 



X 1 - X 2 



z = 



SE_ 



- 2.8 - 2.5 

.132 



M 



= 2.27 



,132 



O 

[IKK 



40 



35 



Since the obtained figure of 2*27 was greater than the figure of 
1.96 needed for p <.05, the evaluator was able to infer 14 that the Title I 
funded treatments were having an impact upon the reading difficulties of 
the disadvantaged learners in such a way as to bring about achievement 
beyond that which would have occurred in the regular classroom (as shown 
by the non treatment group). 



Statistical tests of analysis do not prove anything. Analysis of this 
nature only permits the evaluator to make inferences against the prob- 
ability of making a correct choice. The larger the s? ratio the greater 
is the probability of making the correct choice. In this illustration, 
the evaluator had two choices as follows: (#1) the means of the two 

samples were actually identical and the difference between 2.5 and 2.8 
was due solely to sampling variations (chance error), or (#2) the two 
means were far enough apart to demonstrate a true difference. Choice #1 
is called the "null" hypothesis by evaluators. In this case, the evalu- 
ator had evidence at a probability level of 95 times in 100 that the true 
difference existed. On that basis, he rejected choice #1 (the null hy- 
pothesis) at the .05 level, thereby accepting choice #2. The evaluator 
then went beyond the data, to account for the difference he computed. 

Since the sample populations were equivalent and met his assumptions about 
randomness for a universe of poor readers in the fourth grade, he in- 
ferred that the Title I funded treatment caused the difference. The 
presence or absence of the Title I treatment was defined as the indepen- 
dent variable, while the pupils' reading scores are defined as the 
dependent variable. 

_ The 2" ratio is computed the same way as the t ratio. The use of the 
"z ratio automatically means a large sample (N>120) is involved, while the 
t ratio usually means a smaller sample. The t ratio is frequently used 
with students' t distribution for critical values, while the z ratio in- 
volves values straight from the normal curve. (cf. Guilford, J.P.) 




35 

41 



* 



Using a Correlated z Ratio on percentile Scores for a 
Modified Real v. Anticipated Gain Design 

For a student to maintain his standing at a percentile rank relative 
to a norm, he must gain in achievement as indicated by some measuring device. 
Consider a Title I target population student just beginning ninth grade in 
September with a grade equivalent score on the Stanford Reading Achievement 
Test of 6.5. This 6.5 grade equivalent score is approximately equal to a 
percentile rank of 22 for fall ninth grade pupils. To just maintain the same 
22nd percentile rank in the spring, the target population pupil would have 
to gain approximately 7 months. In other words, a grade equivalent score 
of 7.2 is required to hold the 22nd percentile rank in the spring on the 
ninth grade norm, while a grade equivalent score of 6.5 was • required the 
previous fall. If the pupil gained 5 months (one-half year is 2 months 
less than the required 7 months in this illustration) he would lose his 
position at the' 22nd percentile rank -- dropping lower, even though he 
actually gained in months of reading achievement. 

Because of the phenomenon of having to run (gain in months) just to 
stand still (hold the same percentile rank) several interpretations of 
scores have been given by Title I evaluators. Below are two interpretations 
Option 1. No loss a a gain . If a pupil were at the 23rd percentile rank 
on a standardized test in the fall and maintained the rank in the spring, 
he obviously has not come closer to his more educationally advantaged peers. 
However, since he had to achieve just to not lose his rank at the 23rd 
percentile, his deterioration in educational achievement has been arrested. 
In other words, the treatment is sometimes reported to be "successful" if 
deterioration is halted. A ~z ratio (or t ratio) applied to a correlated 
set of means that showed no significant difference (p <, .05) , two-tailed 



O 



37 



"test) would verify the cessation of deterioration when a group holds the 
same percentile rank at the conclusion of a treatment. 

Option 2. A Statistically Significant Gain . The interpretation of scores 
for a group establishing a statistically significant gain in mean percentile 
ranks is a strong indicator of success of a treatment. To make a statisti- 
cally significant gain, then, the target population (1) did not lose in 
rank and (2) did not gain just enough to maintain the rank. The group re- 
ceiving a significant mean percentile gain has come closer to the more 
educationally advantaged learners. A "Z ratio or t ratio applied to a 
correlated set of percentile rank means must show a significant difference 
(P4*05) to verify this situation. 

When a pretest and posttest are applied to the Same individuals, 
separate standard errors of the two means are not required. The "z ratio is 
calculated directly from the differences between the same pupil's pretest 
score and posttest score by generating a standard error of the mean of the 

group's differences (SE n ). The statistic called the standard error of the 

M 

mean difference automatically adjusts for the amount of correlation present .15 

The z ratio is found by generating a mean difference and dividing that 

difference by the standard error of the mean difference (SE ). 

D 

M 





where d = deviation of a 

difference from the 
mean of the differ- 
ences . 




15 



If the evaluator chooses to compute the SE^ by a process similar to the 

M 

one used for uncorrelated samples, then he would use the formula 

SEjj “/(SE.^ )2 + (SE« + 2r SE SE for the correlated observations. 
M Aj n 2 12 M^ M2 



r is computed with the Pearson Product -Moment formula. 



43 



38 



Example . A district was planning to initiate a remedial reading treatment 
for all third grade pupils in one school. All of the pupils in that third 
grade who had scored at or below the 23rd percentile rank were eligible and 
were going to participate in the Title I treatment. No nontreatment 
eligible group was available for comparison. 

A pretest from a standardized reading test was administered to 138 
pupils and the percentile rank was obtained for each pupil. The posttest 
was administered to 131 pupils and the percentile rank again obtained for 
each pupil. Seven pupil's scores (who did not participate in the posttest 
but did participate in the pretest) were deleted from consideration. The 
"z ratio is computed in the following manner : 

Step 1. Each pupil's pretest percentile rank is subtracted from his 
posttest percentile rank. (X^post - Xjpre = D). The differences are then 
summed, ( ED). This sum is divided by the size of the sample or paired 
scores (N = 131). A mean difference has been obtained (D). 

Step 2. Subtract the mean difference from each pupil's difference. 
(Dj - D = d^). Square the deviations obtained for each pupil. Sum the 

9 

squared deviations (Ed). 



principle involved is to test the difference of the mean difference from 
zero. 

Step 4. Interpret the obtained Z - ratio at P <,.05 where a z of 
+ 1.96 is significant. 

a. If z is negative and larger than -1.96 (i.e. , -2.1) a significant 



Step 3. "z Enter the figures. The statistical 




loss in percentile rank was obtained by the group. 



39 



44 



b. If ~z is either positive or negative but less than 1.96, 
no significant change can be attributed to the treatment. 
However, under the option 1 above where no loss = a gain, 
the pupils have not fallen further behind their more 
educationally advantaged peers. 

c. If 1 is positive and greater than 1.96, then a significant 
gain in percentile rank for the group was obtained, and 
the treatment appears to be helping the pupils "catch up" 
to their more advantaged peers (see option 2 above). 



O 

ERIC 



45 



40 



I 



Small Sample Statistical Analysis 



Statistical tests of inference that are applied to small samples 
(N <120) in compensatory aid projects rest upon several assumptions. One 
primary assumption involved is that the sample available belongs to a larger 
population (i.e., of disadvantaged learners). Furthermore, a second 
assumption is that any descriptive statistic obtained from the sample 
(i.e., the sample's mean reading score on a test) is an estimate of the 
population's parameter (the true population mean reading score). Since a 
sample estimate may be slightly different from the population parameter, 
evaluators demand that the error of the estimate be accounted for. By way 
of illustration, if a pretest mean in a Title I prekindergarten were 
obtained on the Peabody Picture Vocabulary Test in November, the evaluator 
would want to know whether (1) a posttest mean obtained in May was 
significantly different ; or whether (2) the posttest mean was so close to 
the pretest mean that the error involved in each testing overlapped to 
the degree that the posttest mean really was the same as the pretest mean. 
Inferential statistical procedures attempt to answer this question: How 

far apart do two parameters (i.e., means) have to be before an evaluator can 
feel "safe" in declaring that a genuine behavioral change due to treatment 
intervention has occurred? 

In the sections below, two types of inferential statistical tests are 
described. The first type, called the parametric tests, is based upon the 
assumption that (1) some characteristics within the population are known 
and that the sample will possess these characteristics (variables) and 
(2) the distribution of the characteristics is "normal" in the statistical 
sense. The t test, analysis of variance, and analysis of covariance are 



parametric tests described below as they may be applied to compensatory aid 
projects. 

The second type of inferential statistical tests mentioned here are 
called non parametric . Nonparametric tests are used when (1) little is 
known about the population distribution or (2) some characteristics are 
likely to depart from a normal distribution within the population. In- 
cluded below are the most frequently used nonparametric tests: variations 

of the sign test, and, Chi Square (X 2 ). 

Whenever appropriate, the parametric tests should be used in pre- 
ference to the nonparametric inferential statistical tests. 



f 




„ d 

me 



Applying a t Ratio to the Difference 
Between a 

Pretest and Posttest 
(Correlated Sample) 

Illustration 

Consider a remedial reading teacher who desired to conduct special 
field trip excursions to farms with inner city pupils. Words associated 
with the agrarian dimension of our society seldom came into use in the 
everyday language of the target population. Her belief was that the inner 
city pupils would not recognize or comprehend such words until an association 
was formed. 

The remedial reading teacher tailored a word recognition test to the 
topics to be generated by the field trips. She gave a pretest to a ran- 
domly selected number of pupils before the trips, and. then gave a posttest 
after the trips to the same population. The questions before the teacher 
were: Could the scores obtained by the pupils have occurred by chance - or, 
did the field trips change the behavior (word recognition) in the target 
population? The teacher could see that most of the pupils had improved 
(some pupils much more than others), but she was uncertain as to how much 
change was enough to assert that the treatment (field trips) was affecting 
the pupils' learning. The remedial reading teacher decided to test the 
difference between the pretest group mean and the posttest group mean with 
a t ratio to see if the difference was only due to chance (testing errors). 
The total score possible on the test was 10 points. 

Ten pupils (N=10) were administered the pretest and posttest. Below 
are data arranged from the two testings. 




43 

48 



i 






Pupil 

1 

2 

3 

4 

5 

6 

7 

8 
9 

10 



Posttest 

5 

4 

7 

5 

8 
4 

6 
3 
7 
6 



N - 10 Z 55 

Mean 5.5 



Pretest Dif ference(d) 

3 +2 

4 0 

5 +2 

2 +3 

3 +5 

3 +1 

7 -1 

3 0 

2 +5 

3 +3 

E d = +20 



(d)‘ 

4 

0 

4 

9 

25 

1 

1 

0 

25 

9 



Z 35 
3.5 



Z d 2 = 78 



Zd = D = 2.0 
n 



The means are 5.5 and 3.5. (The difference between the means is equal 

to the mean of the differences (2.0)). The sum of the difference was 20, 

while the sum of the squares of the difference was 78. 

t = Zd 7 

1 -(Ed) 2 ]/(N-1) SI 

D 

+20 



y[NZd2 -(Ed) 2 J/(N-l) 



+20 



= +20 “ +20 

^[10 ( 78 ) -( 20 ) 9 /( 10 - 1 ) a/M a/ 42 * 2 6 ‘ 5 



3.08 



lor correlated samples (same sample population under two observations) the 
degrees of freedom = df =■ N-l = 9. 

The critical value of t for 9 degrees of freedom is 2.262 at pi .05. Since 
the obtained 3.08 is greater than 2.262, a significant difference exists 
between the pretest and posttest scores. The teacher can infer .that the 



O 

me 



44 



49 



i 



difference may have occurred as a result of the treatment. (Without a 
control group for comparison, the teacher cannot be as certain in this 
inference . ) 



( 



O 



45 



Actual Fosttest Comparison to the Predicted 
Fosttest Scheme of Data Analysis Using a t Ratio 

Real (treatment posttest) v. anticipated (without treatment) posttest design 
Step 1. Obtain each pupil's pretest grade equivalent. 

Step 2. Subtract 1 (since most standardized tests start at 1.0). 

Step 3. Divide the figure obtained in step 2 by the number of months the 
pupil has been in school to obtain a hypothetical (historical 
regression) rate of growth per month. (Ignore kindergarten months 
1 school year = 10 months.) 

Step 4. Multiply the number of months of Title I treatment by the 
historical rate of growth. 

Step 5. Add the figure obtained in step 4 to the pupil's pretest grade 
equivalent (step 1). 

Step 6. Test the difference for significance between the group predicted 
posttest mean and the obtained posttest mean with a correlated 
t test. 

In September, a diagnostic reading teacher administered the Metro- 
politan Achievement Test (as a pretest) to 30 disadvantaged fourth 
grade learners who had scored below minimum competency on the New York 
State Reading PEP Test. 

The 30 pupils participated for the first time in an ESEA Title I 
remedial project conducted from the first week in October through the last 
week in May (treatment time = 8 months). The reading diagnostician re- 
administered an equivalent level form of the Metropolitan Achievement Test 
(as a posttest) during the first week of June to the 30 pupils. 




51 



46 



From the September (pretest) administration, the diagnostician cal- 



( culated the individualized predicted June scores based upon the pupils' 

historical rate of gain (using the method described in steps 1 through 4 
above) that would have been anticipated if the ESEA Title I treatment had 
not intervened in addition to the regular classroom reading instruction. 

The diagnostician then compared the predicted posttest scores to the actual 
posttest scores by the statistic called the t test (critical ratio) to 
determine whether the 30 pupils 1 achievement was beyond expectation. 



i 




52 







Posttest 


Posttest 




Difference 


Pupil 


Pretest 


Predicted 


Actual 


Difference 


Squared 



i 


2.5 


2.9 


3.2 


+ . 3 


.09 


2 


2.8 


3.3 


3.5 


+ « 2 


.04 


3 


2.2 


2.5 


2; 6 


+ . 1 


.01 


4 


1.8 


2.0 


2.0 


0 


.00 


5 


2.9 


3.4 


3.8 


+ .4 


.16 


6 


3.0 


3.5 


3.9 


+ .4 


.16 




2.8 


3.3 


3.2 


- .1 


.01 


3 


2.5 


2.9 


3.2 


+ . 3 


.09 


9 


2.3 


2.7 


2.8 


+ • 1 


.01 


10 


2.0 


2.3 


2.8 


+ .5 


.25 


11 


2.1 


2.4 


3.0 


+ . 6 


.36 


12 


2.7 


3.1 


3.2 


+ * 1 


.01 


13 


2.0 


2.3 


2.5 


+ . 2 


.04 


14 


2.5 


2.9 


3.5 


+ . 6 


.36 


15 


2.4 


2.8 


2.7 


- .1 


.01 


16 


2.2 


2.5 


2.7 


+ . 2 


.04 


17 


2.6 


3.0 


3.2 


+ . 2 


.04 


18 


2.3 


2.7 


2.9 


+ . 2 


.04 


19 


2.2 


2.5 


3.0 


+ .5 


.25 


20 


2.5 


2.9 


3.7 


+ .8 


.64 


21 


2.3 


2.7 


2.9 


+ . 2 


.04 


22 


2.8 


3.3 


3.9 


+ • 6 


.36 


23 


1.5 


1.6 


1.8 


+ • 2 


.04 


24 


2.7 


3.1 


3.4 


+ .3 


.09 


25 


2.3 


2.7 


3.1 


+ .4 


.16 


26 


2.5 


2.9 


3.2 


+ • 3 


.09 


27 


2.1 


2.4 


2.8 


+ .4 


.16 


28 


2.2 


2.5 


3.0 


+ .5 


.25 


29 


2.3 


2.7 


3.6 


+ .9 


.81 


30 


2.7 


3.1 


3.0 


- .1 


.01 


N = 30 


SUM 


82.9 


92.1 


+9.2 


4.62 




MEAN 


2.76 


3.07 







48 



